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Abstract 

Modern developers are shielded from the inner workings of com- 
puters and networks thanks to several layers of code abstraction. 
We'll dig into those layers from a single line of Perl code, down to 
the bytes that get produced at the bottom of the API stack. 
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1 Introduction 



In the 80s and 90s, anyone who hoped to write a functional script 
of any kind would have to delve deep into the inner workings of the 
machine they were working on. For network code, it would be even 
more challenging, since there were many types of networks, and many 
systems that spoke different languages, had different file structures, 
and so on. The standards and APIs were just beginning to be written, 
and more often than not that meant your code, and in turn you as a 
developer, had to understand exactly what went on deep down in the 
system. 

Now, things are obviously very different. As APIs matured, it made 
no sense for everyone to keep reinventing the wheel. As such, code 
abstraction became the norm. Whether you write in PHP, Perl, Python 
or Visual C#, you're typically dealing with functions that come from 
libraries or modules, which in turn talk to other functions, and so on 
until you end up with an unknown number of abstraction layers between 
what you write and what actually happens. This makes things easier 
by removing complexities, but it also removes us from understanding 
what really happens when we write a line of code, and creates more 
dependencies on other snippets of code which in turn may contain bugs. 

In this document, I will take a single line of Perl code, and follow it 
down through the modules, all the way to the actual bytes going out on 
the network. I picked Perl because it's a language I know, because it's 
available for free on any system, and because it's fairly easy to dig into 
its various modules. 



1.1 Audience 

This document is intended for anyone interested in coding and in the 
inner workings of computer systems. It doesn't assume any familiarity 
with Perl or a specific language, although having some type of scripting 
or coding experience would be useful, along with some experience with 
web development. While you may not understand each snippet of code, 
the main purpose is to follow the flow all the way down to the lowest 
level and realize the amount of work that goes on from a single line of 
code. 

Having Perl installed and following along isn't necessary, but it could 
provide further benefits to try and replicate each layer of abstraction, 
seeing how easy or hard it is to accomplish the same task with less and 
less dependencies. 
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2 Layers upon layers 

The function I selected for this experiment is part of the XML:: Feed[l] 
module and accomplishes much through a single line of code: 

my $feed = XML : :Feed->parse (URI->new("http : //www.reddit . com/ . rss") ) ; 

What this does is simply go out to the web and fetch an XML file, in 
this case an RSS stream from Reddit, and then returns it as a variable 
for you to parse. After importing the module and parsing the line, you 
can then access the information, in this case the latest news entries 
available on the site. Here is a more complete snippet of code you can 
try out for yourself to see the whole flow in action: 

use XML: :Feed; 

use HTML: :FormatText : :WithLinks; 

my $feed = XML : :Feed->parse (URI->new("http : //www. reddit . com/ . rss") ) ; 

foreach my $i ($f eed->entries) 

{ 

print "Title: " . $i->title . "\n"; 
print "Time: " . $i->issued . "\n"; 
print "Link: 11 . $i->link . "\n"; 

$parsed = HTML: :FormatText: : WithLinks->new(bef ore_link=> ' ' , af ter_link=> ' ' 
f ootnote=> ' ' ) ; 

print $parsed->parse ($i->content->body) . "\n\n"; 

} 

It's not necessary to understand all of that, but this code basically 
loops around each entry gathered from that web site, and then dis- 
plays the title, time, link and description of each news entry. It also 
uses the HTML: :FormatText: :WithLinks module to parse the descrip- 
tion from HTML into plain text. For this experiment however, we will 
solely concern ourselves with the line showed above. 



2.1 First layer: Parsing the XML 

Digging into the Perl API is fairly easy. If you do an online search 
for XML: :Feed you will soon find the page on the CPAN site with the 
documentation for that particular module. There, you can click the 
Source link which will show you the source code for that module. In 
our case we're interested in the parseQ function. 

The parse() function has 46 lines of code, so already after just the 
first layer, you can see how much is happening in order to accomplish 
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this one task. Here is the full source code: 

sub parse { 

my $class = shift; 

my($stream, $specif ied_f ormat) = @_; 

return $class->error ("Stream parameter is required") unless $stream; 
my $feed = bless {}, $class; 
my $xml = 11 ; 

if (UNIVERSAL: :isa($stream, 'URI')) { 
my $ua = LWP: :UserAgent->new; 
$ua->agent(__PACKAGE__ . "/$VERSI0N") ; 
$ua->env_proxy ; # force allowing of proxies 
my $res = URI : :Fetch->f etch($stream, UserAgent => $ua) 

or return $class->error (URI : :Fetch->errstr) ; 
return $class->error ("This feed has been permanently removed") 

if $res->status == URI: :Fetch: :URI_G0NE() ; 
$xml = $res->content ; 
} elsif (ref ($stream) eq 'SCALAR') { 

$xml = $$stream; 
} elsif (ref ($stream)) { 

while (read($stream, my ($ chunk) , 8192)) { 
$xml .= $ chunk; 

} 

} else { 

open my $fh, $stream 

or return $class->error( "Can't open $stream: $!"); 
while (read $fh, my ($ chunk ) , 8192) { 

$xml .= $ chunk; 

} 

close $fh; 

} 

return $class->error( "Can't get feed XML content from $stream") 

unless $xml; 
my $f ormat; 

if ($specified_f ormat) { 

$f ormat = $specified_f ormat ; 
} else { 

$f ormat = $feed->identify_f ormat (\$xml) 
or return $class->error ($f eed->errstr) ; 

} 

my $f ormat_class = join '::', __PACKAGE__, "Format", $f ormat; 
eval "use $f ormat_class" ; 

return $class->error ("Unsupported format $f ormat: $@") if $@; 
bless $feed, $f ormat_class ; 

$f eed->init_string(\$xml) or return $class->error ($f eed->errstr) ; 
$feed; 
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A lot of that is to handle errors and possible edge cases, and then 
identify what kind of XML data it is. The actual parsing of the XML is 
done in other modules, namely XML: : Feed: -.Forma t::RSS for RSS feeds, 
but we're not going to concern ourselves with that part. In reality, 
we're only interested in a small fraction of the function. What parseQ 
actually does is use the LWP: :UserAgent[2] module in order to make the 
connection on line 8, since it provides functions to handle proxy servers, 
create HTTP headers and more, then it uses URI::Fetch[3] on line 11 
which is another module that provides a convenient way of reading web 
pages, including support for various features of the HTTP protocol like 
compression, caching, error codes and more. Then, it uses the built-in 
Perl readQ function in order to read the incoming data line by line. Here 
are the relevant lines: 

my $ua = LWP: :UserAgent->new; 

my $res = URI : :Fetch->f etch($stream, UserAgent => $ua) 

open my $fh, $stream 

while (read $fh, my ($ chunk ) , 8192) { 
$xml .= $ chunk; 

} 



Through these two additional modules, along with the native read() 
function, the first abstraction layer can accomplish much through what 
is still a fairly small amount of code. Now it's time to go down to the 
second layer. 



2.2 Second layer: Setting up the connection 

2.2.1 Setting a user agent 

Before being able to read the file, a connection to the web server has 
to be made. If you recall from the last section, this is done by using the 
new function from the LWP:: UserAgent function. While this module is 
fully able to actually go out and make the connection over the network, 
in this case the author of the previous layer selected to use new() and 
then do some more processing first. Let's look at the code: 

sub new 
{ 

# Check for common user mistake 
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Carp: : croak ("Options to LWP: :UserAgent should be key/value pairs, not hash 
reference") 

if ref ($_[1]) eq 'HASH' ; 

my($class, °/ 0 cnf) = @_; 

my $agent = delete $cnf {agent}; 
my $from = delete $cnf{from}; 

my $def_headers = delete $cnf {def ault_headers}; 

my $timeout = delete $cnf {timeout} ; 

$timeout = 3*60 unless defined $timeout; 

my $local_address = delete $cnf {local_address}; 

my $ssl_opts = delete $cnf {ssl_opts} II {}; 

unless (exists $ssl_opts->{verif y_hostname}) { 

# The processing of HTTPS_CA_* below is for compatibility with Crypt :: SSLeay 
if (exists $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME}) { 

$ssl_opts->{verif y_hostname} = $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} ; 

} 

elsif ($ENV{HTTPS_CA_FILE} | | $ENV{HTTPS_CA_DIR}) { 

# Crypt-SSLeay compatibility (verify peer certificate; but not the hostname) 
$ssl_opts->{verif y_hostname} = 0; 
$ssl_opts->{SSL_verif y_mode} = 1; 

} 

else { 

$ssl_opts->{verif y_hostname} = 1; 

} 
} 

unless (exists $ssl_opts->{SSL_ca_f ile}) { 

if (my $ca_f ile = $ENV{PERL_LWP_SSL_CA_FILE} | | $ENV{HTTPS_CA_FILE}) { 
$ssl_opts->{SSL_ca_f ile} = $ca_file; 

} 
} 

unless (exists $ssl_opts->{SSL_ca_path}) { 

if (my $ca_path = $ENV{PERL_LWP_SSL_CA_PATH} | | $ENV{HTTPS_CA_DIR}) { 
$ssl_opts->{SSL_ca_path} = $ca_path; 

} 
} 

my $use_eval = delete $cnf {use_eval} ; 

$use_eval = 1 unless defined $use_eval; 

my $parse_head = delete $cnf {parse_head} ; 

$parse_head = 1 unless defined $parse_head; 

my $show_progress = delete $cnf {show_progress}; 

my $max_size = delete $cnf {max_size} ; 

my $max_redirect = delete $cnf {max_redirect} ; 

$max_redirect = 7 unless defined $max_redirect ; 

my $env_proxy = exists $cnf {env_proxy} ? delete $cnf {env_proxy} : 
$ENV{PERL_LWP_ENV_PROXY} ; 



my $cookie_jar = delete $cnf {cookie_jar} ; 
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53 my $conn_cache = delete $cnf {conn_cache} ; 

54 my $keep_alive = delete $cnf {keep_alive} ; 

55 

56 Carp :: croak (" Can ' t mix conn_cache and keep_alive") 

57 if $conn_cache && $keep_alive; 

58 

59 my $protocols_allowed = delete $cnf {protocols_allowed} ; 

60 my $protocols_f orbidden = delete $cnf {protocols_f orbidden} ; 

61 

62 my $requests_redirectable = delete $cnf {requests_redirectable} ; 

63 $requests_redirectable = ['GET', 'HEAD'] 

64 unless defined $requests_redirectable; 

65 

66 # Actually ""s are just as good as 0's, but for concision we'll just say: 

67 Carp: :croak("protocols_allowed has to be an arrayref or 0, not 
es \"$protocols_allowed\" ! ") 

69 if $protocols_allowed and ref ($protocols_allowed) ne 'ARRAY'; 

70 Carp: : croak("protocols_f orbidden has to be an arrayref or 0, not 

71 \"$protocols_f orbidden\" ! ") 

72 if $protocols_f orbidden and ref ($protocols_f orbidden) ne 'ARRAY'; 

73 Carp: :croak("requests_redirectable has to be an arrayref or 0, not 

74 \"$requests_redirectable\" ! ") 

75 if $requests_redirectable and ref ($requests_redirectable) ne 'ARRAY'; 

76 
77 

78 if (%cnf && $~W) { 

79 Carp: : carp ("Unrecognized LWP: :UserAgent options: ©{[sort keys °/„cnf]}"); 

80 } 

81 

82 my $self = bless { 

83 def_headers => $def _headers, 

84 timeout => $timeout, 

85 local_address => $local_address , 

86 ssl_opts => $ssl_opts, 

87 use_eval => $use_eval, 

88 show_progress=> $show_progress , 

89 max_size => $max_size, 

90 max_redirect => $max_redirect , 

91 proxy => {}, 

92 no_proxy => [] , 

93 protocols_allowed => $protocols_allowed, 

94 protocols_f orbidden => $protocols_f orbidden, 

95 requests_redirectable => $requests_redirectable, 

96 }, $class; 

97 

98 $self ->agent (def ined($agent) ? $agent : $class->_agent) 

if def ined($agent) II ! $def _headers II ! $def_headers->header ("User-Agent ") ; 

ioo $self ->f rom($f rom) if $from; 

$self ->cookie_jar ($cookie_jar) if $cookie_jar; 
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$self ->parse_head($parse_head) ; 
$self ->env_proxy if $env_proxy; 



104 



105 



106 



$self ->protocols_allowed( $protocols_allowed ) if $protocols_allowed; 
$self ->protocols_f orbidden($protocols_f orbidden) if $protocols_f orbidden; 



107 



109 



108 
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110 



if ($keep_alive) { 

$conn_cache ||= { total_capacity => $keep_alive }; 
} 

$self ->conn_cache ($conn_cache) if $conn_cache; 



112 



113 



return $self ; 



Okay, so the amount of code is starting to be staggering. Fortunately, 
again only a small portion is of interest for our purposes. Basically, the 
goal of this function is to create a new instance of the LWP: : User Agent 
class and configure various parameters to do with the upcoming con- 
nection, such as the user agent to pass to the server, how long the 
connection should stay active before timing out, which HTTP headers 
should be sent out, how to handle encryption in the case of SSL web 
sites, how to store any cookies that the site decides to send, and so on. 

As you can imagine, if every developer had to worry about all of these 
things every time they wanted to fetch a file from a web site, it would 
be quite inconvenient. In our particular case, no parameter is passed 
out to the new() function so we're basically accepting all the defaults, 
then parse() above is making two additional changes, namely setting a 
custom user agent, and copying the system-wide proxy settings to the 
function: 

$ua->agent(__PACKAGE__ . " /$VERSI0N") ; 
2 $ua->env_proxy ; # force allowing of proxies 

Take note of what the user agent is being set to. When we get down 
through the layers and look at the code that actually goes out on the 
network, we'll get to see it in action. 



2.2.2 Opening a stream 

So far we've gotten a LWP: :UserAgent object, and now we need to 
open up a stream. If you recall from section 2.1, the user agent variable 
is $ua, which is then passed to the fetchQ function from the URI: : Fetch 
module. Let's look at this latest one. Be ready to scroll: 

i sub fetch { 
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2 my $class = shift; 

3 my($uri, °/ 0 param) = @_; 

4 

5 # get user parameters 

6 my $cache = delete $param{Cache} ; 

7 my $ua = delete $param{User Agent} ; 

8 my $p_etag = delete $param{ETag} ; 

9 my $p_lastmod = delete $param{LastModif ied}; 

my $content_hook = delete $param{ContentAlterHook} ; 

n my $p_no_net = delete $param{NoNetwork} ; 

12 my $p_cache_grep = delete $param{CacheEntryGrep} ; 

13 my $freeze = delete $param{Freeze} ; 

14 my $thaw = delete $param{Thaw} ; 

is my $force = delete $param{ForceResponse} ; 

16 croak ("Unknown parameters: " . join(", ", keys %param)) 

17 if %param; 

18 

19 my $ref ; 

20 if ($cache) { 

21 unless ($freeze && $thaw) { 

22 require Storable; 

23 $thaw = \&Storable : :thaw; 

24 $freeze = \&Storable: : freeze; 

25 } 

26 if (my $blob = $cache->get ($uri) ) { 

27 $ref = $thaw->($blob) ; 

28 } 

29 } 

30 

31 # NoNetwork support (see pod docs below for logic clarification) 

32 if ($p_no_net) { 

croak ("Invalid NoNetworkValue (negative)") if $p_no_net < 0; 

if ($ref && ($p_no_net == 1 | | $ref ->{CacheTime} > time() - $p_no_net)) { 

35 my $fetch = URI :: Fetch : :Response->new; 

36 $f etch->status(URI_OK) ; 

37 $f etch->content ($ref ->{Content}) ; 

38 $f etch->etag($ref->{ETag}) ; 

39 $f etch->last_modif ied($ref ->{LastModif ied}) ; 

40 $f etch->content_type ($ref ->{ContentType}) ; 

41 return $fetch; 

42 } 

43 return undef if $p_no_net == 1; 

44 } 

45 

46 $ua ||= do { 

47 my $ua = LWP: :UserAgent->new; 

48 $ua->agent(join '/', $class, $class->VERSI0N) ; 

49 $ua->env_proxy ; 
$ua; 
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>; 

my $req = HTTP : :Request->new (GET => $uri) ; 
if ($HAS_ZLIB) { 

$req->header ( 1 Accept-Encoding ' , ' gzip 1 ) ; 

} 

if (my $etag = ($p_etag || $ref->{ETag}) ) { 
$req->header ( ' If -None-Match ' , $etag) ; 

} 

if (my $ts = ($p_lastmod || $ref ->{LastModif ied}) ) { 
$req->if _modif ied_since ($ts) ; 

} 

my $res = $ua->request ($req) ; 

my $fetch = URI :: Fetch : :Response->new; 

$fetch->uri($uri) ; 

$f etch->http_status ($res->code) ; 

$f etch->http_response ($res) ; 

$f etch->content_type ($res->header ( ' Content -Type ' ) ) ; 
if ($res->previous && $res->previous->code == 
HTTP : : Status : : RC_MOVED_PERMANENTLY() ) { 

$f etch->status (URI_MDVED_PERMANENTLY) ; 

$f etch->uri($res->previous->header( 'Location' )) ; 
} elsif ($res->code == HTTP: : Status : :RC_GDNE()) { 

$fetch->status(URI_GONE) ; 

$f etch->uri (undef ) ; 

return $fetch; 

} elsif ($res->code == HTTP: : Status : :RC_NOT_MDDIFIED() ) { 
$f etch->status (URI_NDT_MODIFIED) ; 
$f etch->content ($ref ->{Content}) ; 
$fetch->etag($ref->{ETag}) ; 

$f etch->last_modif ied($ref ->{LastModif ied}) ; 
$f etch->content_type ($ref ->{ContentType}) ; 
return $fetch; 
} elsif ( ! $res->is_success) { 

return $force ? $fetch : $class->error ($res->message) ; 

} else { 

$fetch->status(URI_OK) ; 

} 

$f etch->last_modif ied($res->last_modif ied) ; 
$fetch->etag($res->header( 'ETag' )) ; 
my $content = $res->content ; 

if ($res->content_encoding && $res->content_encoding eq 'gzip') { 
$content = Compress: :Zlib: :memGunzip($content) ; 

} 

# let caller-defined transform hook modify the result that'll be 

# cached, perhaps the caller only wants the <head> section of 



A study of code abstraction 



10 



100 # HTML, or wants to change the content to a parsed datastructure 

101 # already serialized with Storable. 

102 if ($content_hook) { 

croak("ContentAlterHook is not a subref") unless ref $content_hook eq 

104 "CODE"; 

105 $content_hook->(\$content) ; 

106 } 

107 

108 $f etch->content ($content) ; 

109 

no # cache by default, if there's a cache, but let callers cancel 

in # the cache action by defining a cache grep hook 

112 if ($cache && 

113 ($p_cache_grep ? $p_cache_grep-> ($f etch) : 1)) { 

114 

115 $cache->set ($uri , $freeze->({ 

116 ETag => $f etch->etag, 

117 LastModif ied => $f etch->last_modif ied, 
us Content => $f etch->content , 

119 CacheTime => time(), 

120 Content Type => $f etch->content_type, 

})); 

122 } 

123 $fetch; 



124 } 

Let's break down what the function does. First, it accepts a number 
of parameters which get set after line 5. As we've seen in 2.1, only two 
get passed on in our case: the stream variable, which will be used to 
read the information from the network, and the user agent class. Then, 
this function has a number of conditional statements to deal with all of 
these potential parameters. In our case, most of them are ignored since 
we aren't dealing with cache, serialization, content handling, and so on. 
Instead, we go right into the interesting part at line 53 which deals with 
opening the network connection, and then after line 64, dealing with 
the response from the server. Here is the relevant code: 



my $req = HTTP : :Request->new (GET => $uri) ; 

2 ... 

my $res = $ua->request ($req) ; 
my $fetch = URI :: Fetch : :Response->new; 
s $fetch->uri($uri) ; 



As you can see, once again this function doesn't actually have any 
network code. It creates a new object from the HTTP: : Request[4] mod- 
ule, then uses the requestQ function from LWP: : User Agent to make the 
request on the opened connection, which will be our third abstraction 
layer. Then, we see another new module being used, URI: : Fetch :: Response[5] 
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in order to parse the various response codes. A web server can return a 
number of codes along with the normal HTTP headers, such as whether 
the connection was successful, if the file was moved, if the requested 
item doesn't exist, and so on. This is what we see happening in the rest 
of the function. 



2.3 Third layer: Making the request 

Let's recap. So far, we've been through two different abstraction lay- 
ers. After using the XMLr.Feed module to parse the XML file from Red- 
dit, we go down into LWPr.UserAgent and URI: -.Fetch in order to make 
a new user agent object, and pass it to fetch() in order to get data from 
a site. Now, we go down one more layer into HTTP: : Request- > new () 
to prepare the request and LWP: :UserAgent->request() to send it out. 



2.3.1 Preparing the request 

The first thing that the previous layer does is calling the new() function 
from HTTP: : Request. Let's look at the code: 

sub new 
{ 

my($class, $method, $uri, $header, $content) = @_; 
my $self = $class->SUPER: :new($header , $content) ; 
$self->method($method) ; 
$self->uri($uri) ; 
$self ; 

} 

While this is a tiny function, the amount of work it does is deceiv- 
ing. On line 4, it actually calls the new() function of its base module, 
HTTP: :Message[6] . We'll skip that one because all it does is set the 
default headers for the upcoming connection. The next line sets the 
method for the connection, which is a parameter passed by the previous 
layer. If you remember, that method was GET. Any HTTP connection 
must have a valid method. GET is usually used to fetch information, 
while POST is used to send form data, such as logging into a web site. 
Finally, the uri() function simply parses the URL passed to make sure 
it's valid, returning various error messages in case it isn't. 
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2.3.2 Sending the request 

After making a new object of the HTTP: : Request type, the previous 
abstraction layer called the requestQ function from the LWP: :UserAgent 
module. Let's see its source code: 

sub request 
{ 

my($self, $request, $arg, $size, $previous) = @_; 

my $response = $self ->simple_request ($request , $arg, $size) ; 
$response->previous ($previous) if $previous; 

if ($response->redirects >= $self ->{max_redirect}) { 
$response->header ( "Client-Warning" => 

"Redirect loop detected (max_redirect = 
$self->{max_redirect}) ") ; 
return $response; 

} 

if (my $req = $self->run_handlers("response_redirect" , $response)) { 
return $self ->request ($req, $arg, $size, $response) ; 

} 

my $code = $response->code ; 

if ($code == &HTTP: : Status: : RC_MDVED_PERMANENTLY or 

$code == &HTTP: : Status : :RC_F0UND or 

$code == &HTTP: : Status : : RC_SEE_OTHER or 

$code == &HTTP: : Status: :RC_TEMPORARY_REDIRECT) 

{ 

my $referral = $request->clone ; 

# These headers should never be forwarded 
$ref erral->remove_header ( ' Host ' , ' Cookie 1 ) ; 

if ($referral->header('Referer') kk 

$request->uri->scheme eq 'https' kk 
$ref erral->uri->scheme eq 'http') 

{ 

# RFC 2616, section 15.1.3. 

# https -> http redirect, suppressing Referer 
$ref erral->remove_header ( 'Referer' ) ; 

} 

if ($code == &HTTP: : Status: :RC_SEE_OTHER || 
$code == &HTTP: : Status: :RC_F0UND) 
{ 

my $method = uc($ref erral->method) ; 
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unless ($method eq "GET" | | $method eq "HEAD") { 

$referral->method("GET") ; 

$ref erral->content ( " " ) ; 

$ref erral->remove_content_headers ; 

} 



# And then we update the URL based on the Location: -header. 

my $ref erral_uri = $response->header ( 'Location' ) ; 

{ 

# Some servers erroneously return a relative URL for redirects, 

# so make it absolute if it not already is . 
local $URI : : ABS_ALLOW_RELATIVE_SCHEME = 1; 
my $base = $response->base ; 

$ref erral_uri = "" unless defined $ref erral_uri ; 
$referral_uri = $HTTP: :URI_CLASS->new($ref erral_uri, $base) 
->abs ($base) ; 

} 

$ref erral->uri ($ref erral_uri) ; 

return $response unless $self ->redirect_ok($ref erral , $response) ; 
return $self ->request ($ref erral, $arg, $size, $response) ; 

} 

elsif ($code == &HTTP: : Status: :RC_UNAUTHORIZED || 

$code == &HTTP: : Status: :RC_PROXY_AUTHENTICATION_REQUIRED 

) 

{ 

my $proxy = ($code == &HTTP: : Status: :RC_PROXY_AUTHENTICATIDN_REQUIRED) ; 
my $ch_header = $proxy || $request->method eq ' CONNECT ' 

? "Proxy-Authenticate" : "WWW-Authenticate" ; 
my ©challenge = $response->header ($ch_header) ; 
unless (©challenge) { 

$response->header ( "Client-Warning" => 

"Missing Authenticate header"); 

return $response; 

} 

require HTTP: : Headers : :Util; 

CHALLENGE: for my $challenge (©challenge) { 

$challenge =~ tr/,/;/; # "," is used to separate auth-params ! ! 

($challenge) = HTTP: :Headers: :Util: : split_header_words ($challenge) ; 

my $scheme = shift (@$challenge) ; 

shift (@$challenge) ; # no value 

$challenge = { @$challenge }; # make rest into a hash 

unless ($scheme =~ /" ( [a-z] +(? : - [a-z] +) *) $/) { 
$response->header ( "Client-Warning" => 

"Bad authentication scheme '$scheme'"); 
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93 return $response; 

94 } 

95 $scheme = $1; # untainted now 

96 my $class = "LWP: :Authen: :\u$scheme" ; 

97 $class =~ s/-/_/g; 

98 

99 no strict 'refs'; 

100 unless (%{"$class\ : : "}) { 

101 # try to load it 

102 eval "require $class"; 

103 if ($0) { 

104 if ($0 =~ /"Cant locate/) { 

105 $response->header ("Client-Warning" => 

106 "Unsupported authentication scheme '$scheme'"); 

107 } 

108 else { 

109 $response->header ("Client-Warning" => $@) ; 
no } 

next CHALLENGE; 

112 } 

113 } 

114 unless ($class->can("authenticate") ) { 
us $response->header ("Client-Warning" => 

116 "Unsupported authentication scheme '$scheme'"); 

117 next CHALLENGE; 

118 } 

return $class->authenticate ($self , $proxy, $challenge, $response, 

120 $request, $arg, $size) ; 

121 } 

122 return $response; 

123 } 

124 return $response; 



125 } 

While this is a massive function, it's actually just half of the story. Its 
main purpose is to parse the headers received by the server and act 
on them. For example, line 21 checks whether the server said that the 
requested file was moved, and if so, makes another request to the new 
address. On line 72 it also handles the case where a proxy server is 
required, and whether that proxy needs authentication. 

But before all of that can happen, we still need to open the actual con- 
nection, we need the function that tells the system to open a network 
socket. This happens on line 5. The simple_request() function actu- 
ally does some more preparation tasks and then calls send_request(). 
That's the second half of the story for this layer of abstraction: 

i sub send_request 
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my($self, $request, $arg, $size) = @_; 

my($method, $url) = ($request->method, $request->uri) ; 

my $scheme = $url->scheme ; 

local($SIG{ DIE }) ; # protect against user defined die handlers 

$self ->progress ("begin" , $request) ; 

my $response = $self->run_handlers("request_send" , $request) ; 

unless ($response) { 
my $protocol; 

{ 

# Honor object-specific restrictions by forcing protocol objects 

# into class LWP: : Protocol : :nogo. 
my $x; 

if($x = $self->protocols_allowed) { 
if (grep lc($_) eq $scheme, @$x) { 
} 

else { 

require LWP: : Protocol : :nogo; 
$protocol = LWP: :Protocol: :nogo->new; 

} 

} 

elsif ($x = $self ->protocols_f orbidden) { 
if (grep lc($_) eq $scheme, @$x) { 
require LWP: : Protocol : :nogo; 
$protocol = LWP: : Protocol : :nogo->new; 

} 

} 

# else fall thru and create the protocol object normally 

} 

# Locate protocol to use 

my $proxy = $request->{proxy}; 

if ($proxy) { 

$scheme = $proxy->scheme ; 

} 

unless ($protocol) { 

$protocol = eval { LWP: : Protocol :: create ($scheme, $self) }; 
if ($@) { 

$0 =~ s/ at .* line \d+.*//s; # remove file/line number 
$response = _new_response ($request , 
&HTTP: : Status: :RC_N0T_ IMPLEMENTED, $0) ; 
if ($scheme eq "https") { 

$response->message($response->message . " 
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(LWP: : Protocol : :https not installed)"); 

$response->content_type ( "text /plain" ) ; 

$response->content (<<E0T) ; 
LWP will support https URLs if the LWP: : Protocol: rhttps module 
is installed. 
EOT 

} 

} 

} 

if ( ! $response && $self ->{use_eval}) { 

# we eval, and turn dies into responses below 
eval { 

$response = $protocol->request ($request , $proxy, $arg, $size, 
$self->{timeout}) | | 
die "No response returned by $protocol"; 
}; 

if ($@) { 

if (UNIVERSAL: :isa($@, "HTTP: : Response")) { 
$response = $0; 
$response->request ($request) ; 

} 

else { 

my $full = $0; 

(my $status = $0) =~ s/\n.*//s; 

$status =~ s/ at .* line \d+.*//s; # remove file/line number 
my $code = ($status =~ s/~ (\d\d\d) \s+//) ? $1 : 
&HTTP: : Status: : RC_INTERNAL_SERVER_ERROR; 

$response = _new_response ($request , $code, $status, $full) ; 

} 

} 

} 

elsif (!$response) { 

$response = $protocol->request ($request , $proxy, 

$arg, $size, $self ->{timeout}) ; 

# XXX: Should we die unless $response->is_success ??? 

} 

} 

$response->request ($request) ; # record request for reference 
$response->header ("Client-Date" => HTTP: :Date: :time2str (time) ) ; 

$self->run_handlers("response_done" , $response) ; 

$self ->progress ("end" , $response) ; 
return $response; 

} 
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Again, we see a lot more handling of the request, error handling, 
and various edge cases. Little is of interest in here, except for line 64. 
Here we get introduced to another module which will make up the next 
abstraction layer: LWP: :Protocol[7]. All of the information we've dealt 
with so far, including the request from HTTP: : Request, the user agent, 
optional arguments, are all passed onto that new module. 



2.4 Fourth layer: Defining protocols 

If you look at the new() function for the LWP: .-Protocol module, you 
may be left a bit confused: 

sub create 
{ 

my($scheme, $ua) = @_; 

my $impclass = LWP :: Protocol :: implementor ($scheme) or 
Carp: : croak ("Protocol scheme ' $scheme ' is not supported"); 

# hand-off to scheme specific implementation sub-class 
my $protocol = $impclass->new($scheme , $ua) ; 

return $protocol; 

} 

This is supposed to be the key module to do all of the network stuff 
we've been looking for since the start. What this actually does is hand 
off all of the work to the proper subclass. This module has sub-modules 
for each type of request, including LWP: .-Protocol: :http for HTTP re- 
quests, LWP: .-Protocol .-file for files, and so on. But if you go on the 
CPAN site and try to look at those sub-modules, you may find them 
suspiciously missing. This is because for the first time so far, this layer 
has been hidden from us. While developers are expected to work with 
any of the previous modules, now we've finally delved deep enough that 
for normal use cases, it's been decided that we're now entering a layer 
deep enough that we shouldn't mess with it. Here be dragons... 

Of course, Perl is open source, and the whole point of this experiment 
is to break away those layers, so we're not going to let that stop us. If 
you go on the source repository for the module[8], you can find what 
we need. Here's the code for the massive requestQ function: 

sub request 
{ 

my($self, $request, $proxy, $arg, $size, $timeout) = @_; 
$size | |= 4096; 
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# check method 

my $method = $request->method; 

unless ($method =~ /~ [A-Za-z0-9_! \#\$%&\ 1 *+\-. ~V I -]+$/) { # HTTP token 
return HTTP: :Response->new( &HTTP: : Status : :RC_BAD_REQUEST, 

'Library does not allow method ' . 

"$method for 'http: 1 URLs"); 

} 

my $url = $request->uri; 
my($host, $port, $f ullpath) ; 

# Check if we're proxy ' ing 
if (defined $proxy) { 

# $proxy is an URL to an HTTP server which will proxy this request 
$host = $proxy->host ; 

$port = $proxy->port ; 

$fullpath = $method eq "CONNECT" ? 

($url->host . ":" . $url->port) : 

$url->as_str ing ; 

} 

else { 

$host = $url->host; 
$port = $url->port ; 
$f ullpath = $url->path_query ; 

$fullpath = "/$fullpath" unless $fullpath =~ m,~/,; 
} 

# connect to remote site 

my $socket = $self ->_new_socket ($host , $port, $timeout) ; 

my $http_version = ""; 
if (my $proto = $request->protocol) { 
if ($proto =~ /~(?:HTTP\/)?(l.\d+)$/) { 
$http_version = $1; 

$socket->http_version($http_version) ; 
$socket->send_te (0) if $http_version eq "1.0"; 

} 
} 

$self ->_check_sock($request , $socket) ; 
my Oh; 

my $request_headers = $request->headers->clone ; 
$self->_f ixup_header($request_headers, $url, $proxy) ; 

$request_headers->scan(sub { 
my($k, $v) = @_; 
$k =~ s/~://; 
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55 $v =~ s/W /g; 

se push(@h, $k, $v) ; 

57 }); 

58 

59 my $content_ref = $request->content_ref ; 

60 $content_ref = $$content_ref if ref ($$content_ref ) ; 

61 my $chunked; 

62 my $has_content ; 

63 

64 if (ref ($content_ref ) eq 'CODE') { 

65 my $clen = $request_headers->header ( ' Content -Length ' ) ; 

66 $has_content++ if $clen; 

67 unless (defined $clen) { 

68 push(@h, "Transfer-Encoding" => "chunked"); 

69 $has_content++; 

70 $chunked++; 
} 

72 } 

73 else { 

74 # Set (or override) Content -Length header 

75 my $clen = $request_headers->header ( ' Content -Length ' ) ; 

76 if (def ined($$content_ref ) && length ($$content_ref ) ) { 

77 $has_content = length ($$content_ref) ; 

78 if ( ! def ined($clen) II $clen ne $has_content) { 

79 if (defined $clen) { 

so warn "Content -Length header value was wrong, fixed"; 

si hlist_remove (\@h, 'Content-Length'); 

82 } 

83 push(@h, ' Content -Length ' => $has_content) ; 

84 } 

85 } 

86 elsif ($clen) { 

87 warn "Content -Length set when there is no content, fixed"; 

88 hlist_remove (\@h, ' Content -Length ') ; 

89 } 

90 } 

91 

92 my $write_wait = 0; 

93 $write_wait = 2 

if ($request_headers->header ("Expect") || "") =~ /100-continue/ ; 

95 

96 my $req_buf = $socket->f ormat_request ($method, $fullpath, @h) ; 

97 #print " \n$req_buf\n \n" ; 

98 

99 if ( ! $has_content || $write_wait || $has_content > 8*1024) { 

100 WRITE: 

101 { 

102 # Since this just writes out the header block it should almost 

103 # always succeed to send the whole buffer in a single write call. 
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my $n = $socket->syswrite ($req_buf , length ($req_buf )) ; 
105 unless (defined $n) { 

ice redo WRITE if $!{EINTR}; 

107 if ($!{EAGAIN» { 

108 select (undef , undef, undef, 0.1); 

109 redo WRITE; 

no } 

in die "write failed: $!"; 

112 } 

113 if ($n) { 

114 substr($req_buf , 0, $n, ""); 
} 

116 else { 

117 select (undef , undef, undef, 0.5); 

118 } 

119 redo WRITE if length $req_buf ; 

120 } 

121 } 

122 

123 my($code, $mess, Ojunk) ; 

124 my $drop_connection; 

125 

126 if ($has_content) { 

127 my $eof ; 

128 my $wbuf ; 

129 my $woffset = 0; 

130 INITIAL_READ : 

131 if ($write_wait) { 

132 # skip filling $wbuf when waiting for 100-continue 

133 # because if the response is a redirect or auth required 

134 # the request will be cloned and there is no way 

135 # to reset the input stream 

136 # return here via the label after the 100-continue is read 

137 } 

138 elsif (ref ($content_ref ) eq 'CODE') { 

139 my $buf = &$content_ref () ; 

wo $buf = "" unless def ined($buf ) ; 

$buf = sprintf "°/U7 0 s°/ 0 s 0 / 0 s" , length($buf ) , $CRLF , $buf, $CRLF 

142 if $chunked; 

143 substr($buf, 0, 0) = $req_buf if $req_buf; 

144 $wbuf = \$buf; 

145 } 

146 else { 

147 if ($req_buf) { 

148 my $buf = $req_buf . $$content_ref ; 

149 $wbuf = \$buf; 

150 } 

151 else { 

152 $wbuf = $content_ref ; 
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153 } 

154 $eof = 1; 
} 

156 

157 my $fbits = 11 ; 

158 vec($fbits, f ileno ($socket) , 1) = 1; 

159 

160 WRITE : 

lei while ($write_wait || $woffset < length ($$wbuf ) ) { 

162 

163 my $sel_timeout = $timeout; 

164 if ($write_wait) { 

165 $sel_timeout = $write_wait if $write_wait < $sel_timeout ; 

166 } 

167 my $time_bef ore; 

168 $time_bef ore = time if $sel_timeout ; 

169 

170 my $rbits = $fbits; 

171 my $wbits = $write_wait ? undef : $fbits; 

172 my $sel_timeout_bef ore = $sel_timeout ; 

173 SELECT: 

174 { 

175 my $nfound = select ($rbits , $wbits, undef, $sel_timeout) ; 

176 if ($nfound < 0) { 

177 if ($!{EINTR> || $ ! {EAGAIN}) { 

178 if ($time_bef ore) { 

179 $sel_timeout = $sel_timeout_bef ore - (time - 

180 $time_bef ore) ; 

181 $sel_timeout = 0 if $sel_timeout < 0; 

182 } 

183 redo SELECT; 

184 } 

185 die "select failed: $!"; 

186 } 

187 } 

188 

189 if ($write_wait) { 

190 $write_wait -= time - $time_bef ore ; 

191 $write_wait = 0 if $write_wait < 0; 

192 } 

193 

194 if (def ined($rbits) && $rbits =- /[~\0]/) { 

195 # readable 

196 my $buf = $socket->_rbuf ; 

197 my $n = $socket->sysread($buf , 1024, length($buf ) ) ; 

198 unless (defined $n) { 

199 die "read failed: $!" unless $ ! {EINTR} || $ ! {EAGAIN} ; 

200 # if we get here the rest of the block will do nothing 

201 # and we will retry the read on the next round 
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202 } 

203 elsif ($n == 0) { 

204 # the server closed the connection before we finished 

205 # writing all the request content. No need to write any more. 

206 $drop_connection++; 

207 last WRITE; 

208 } 

209 $socket->_rbuf ($buf ) ; 

210 if (!$code kk $buf =~ /\015?\012\015?\012/) { 

211 # a whole response header is present, so we can read it without blocking 

212 ($code, $mess, @h) = $socket->read_response_headers (laxed => 1, 

213 junk_out => \@junk, 

214 ) ; 

215 if ($code eq "100") { 

216 $write_wait = 0; 

217 undef ($code) ; 

218 goto INITIAL_READ; 

219 } 

220 else { 

221 $drop_connection++; 

222 last WRITE; 

223 # XXX should perhaps try to abort write in a nice way too 

224 } 

225 } 

226 } 

227 if (def ined($wbits) kk $wbits =- /[~\0]/) { 

228 my $n = $socket->syswrite ($$wbuf , length($$wbuf ) , $wof f set) ; 

229 unless (defined $n) { 

230 die "write failed: $!" unless $ ! {EINTR} || $ ! {EAGAIN} ; 

231 $n = 0; # will retry write on the next round 

232 } 

233 elsif ($n == 0) { 

234 die "write failed: no bytes written"; 

235 } 

236 $woffset += $n; 

237 

238 if (!$eof kk $woffset >= length($$wbuf ) ) { 

239 # need to refill buffer from $content_ref code 

240 my $buf = &$content_ref () ; 

241 $buf = "" unless def ined($buf ) ; 

242 $eof++ unless length ($buf) ; 

243 $buf = sprintf "%x%s°/„s%s" , length($buf ) , $CRLF, $buf, $CRLF 

244 if $chunked; 

245 $wbuf = \$buf; 

246 $woffset = 0; 

247 } 

248 } 

249 } # WRITE 

250 } 
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251 

252 ($code, $mess, @h) = $socket->read_response_headers (laxed => 1, 

253 junk_out => \@junk) 

254 unless $code; 

255 ($code, $mess, @h) = $socket->read_response_headers (laxed => 1, 

256 junk_out => \@junk) 

257 if $code eq "100"; 

258 

259 my $response = HTTP: :Response->new($code, $mess) ; 

260 my $peer_http_version = $socket->peer_http_version; 

261 $response->protocol("HTTP/$peer_http_version") ; 

262 { 

263 local $HTTP: : Headers : :TRANSLATE_UNDERSCORE; 

264 $response->push_header (@h) ; 

265 } 

266 $response->push_header ( "Client- Junk" => \@junk) if Ojunk; 

267 

268 $response->request ($request) ; 

269 $self ->_get_sock_inf o($response, $socket) ; 

270 

271 if ($method eq "CONNECT") { 

272 $response->{client_socket} = $socket; # so it can be picked up 

273 return $response; 

274 } 

275 

276 if (my @te = $response->remove_header ( ' Transf er-Encoding ' ) ) { 

277 $response->push_header ( 'Client -Transf er-Encoding' , \@te) ; 

278 } 

279 $response->push_header ( 'Client -Response-Num ' , scalar 

280 $socket->increment_response_count) ; 

281 

282 my $complete; 

283 $response = $self ->collect ($arg, $response, sub { 

284 my $buf = ""; #prevent use of uninitialized value in SSLeay.xs 

285 my $n; 

286 READ : 

287 { 

288 $n = $socket->read_entity_body ($buf , $size) ; 

289 unless (defined $n) { 

290 redo READ if $ ! {EINTR} || $ ! {EAGAIN} ; 

291 die "read failed: $!"; 

292 } 

293 redo READ if $n == -1; 

294 } 

295 $complete++ if !$n; 

296 return \$buf ; 

297 } ); 

298 $drop_connection++ unless $complete; 

299 



A study of code abstraction 



24 



300 Oh = $socket->get_trailers ; 

301 if (Oh) { 

302 local $HTTP: : Headers: :TRANSLATE_UNDERSCORE; 

303 $response->push_header (Oh) ; 

304 } 

305 

306 # keep-alive support 

307 unless ($drop_connection) { 

308 if (my $conn_cache = $self ->{ua}{conn_cache}) { 

309 my ^connection = map { (lc($_) => 1) } 

310 split (As* , \s*/ , ($response->header ("Connection") II "")); 
if ( ($peer_http_version eq "1.1" && ! $connection{close}) II 

312 $connection{"keep-alive"}) 

313 { 

$conn_cache->deposit ($self ->socket_type , "$host : $port " , $socket) ; 

315 } 

316 } 

317 } 

318 

319 $response; 



320 y 

The code starts by checking if the input parameters are valid, whether 
there's a proxy or not, and then on line 35, you can see a call to the 
_new_socket() function inside the same module. This function calls the 
IO: .'Socket: : IN ET[9] module in order to create a new socket, which is 
a computer's way to open a logical connection to a network resource, 
such as a web server. After that, the function adds various information 
on that socket, including the version of the HTTP protocol used on line 
40, and the default headers on line 49. It defines how much content it's 
sending on line 75, then actually writes out the headers on the socket 
starting at line 100 using the syswriteQ function. 

After that, it reads from the socket starting at line 130 using sys- 
read(), and starts parsing the headers at line 252. The syswrite() and 
sysreadQ functions are built-in Perl functions that can read and write 
to a stream, such as the opened network socket. Of more interest is 
the IO: -.Socket: : IN ET module which builds upon 7O//Soc/cet[10]. Let's 
go down another abstraction level. 



2.5 Fifth layer: I/O Sockets 

IO simply means input and output. With the IO: -.Socket module we're 
now deep enough for things to be generalized greatly. We're no longer 
talking about XML data, or even HTTP connections. We're now deep 
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enough on the OSI model[ll] to be talking directly to network drivers 
in your operating system. This is where an Internet socket, which is 
an endpoint of an inter-process communication flow across a computer 
network, is created. 

The first thing the previous layer did was call the bindingQ function 
of the IOr.Socket module, which in turn calls the ioctlQ Perl function, 
which is a system call to set whether the socket will be blocking (stops 
execution of the code while waiting for all the data to be sent or re- 
ceived) or non-blocking (allows the code to continue while partial data 
is on the line). Then, the actual socket is created. 

To see what actually goes on when creating a socket, let's look at 
the configureQ function which makes all the interesting initial network 
calls: 

i sub configure { 



2 my($sock,$arg) = @_; 

3 my($lport,$rport,$laddr,$raddr,$proto,$type) ; 

4 
5 

6 $arg->{LocalAddr} = $arg->{LocalHost} 

7 if exists $arg->{LocalHost} kk ! exists $arg->{LocalAddr} ; 

8 

9 ($laddr,$lport,$proto) = _sock_inf o($arg->{LocalAddr}, 
10 $arg->{LocalPort} , 

n $arg->{Proto}) 
12 or return _error ($sock, $!, $0); 

13 

$laddr = defined $laddr ? inet_aton($laddr) 
is : INADDR_ANY; 

16 

return _error($sock, $EINVAL , "Bad hostname 1,1 , $arg->{LocalAddr} , ) 

is unless (defined $laddr) ; 

19 

20 $arg->{PeerAddr} = $arg->{PeerHost} 

21 if exists $arg->{PeerHost} kk ! exists $arg->{PeerAddr} ; 

22 

23 unless (exists $arg->{Listen}) { 

24 ($raddr,$rport,$proto) = _sock_inf o ($arg->{PeerAddr} , 

25 $arg->{PeerPort} , 

26 $proto) 

27 or return _error ($sock, $!, $@) ; 

28 } 

29 

30 $proto ||= _get_proto_number( 'tcp' ) ; 

31 

32 $type = $arg->{Type} | | $socket_type{lc _get_proto_name ($proto) } ; 

33 
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my Oraddr = () ; 

if (defined $raddr) { 

Oraddr = $sock->_get_addr ($raddr , $arg->{MultiHomed}) ; 
return _error($sock, $EINVAL, "Bad hostname "' , $arg->{PeerAddr} , ) 
unless Oraddr; 

} 

while (1) { 

$sock->socket(AF_INET, $type, $proto) or 
return _error ($sock, $!, "$!"); 

if (defined $arg->{Blocking}) { 

defined $sock->blocking($arg->{Blocking}) 

or return _error ($sock, $!, "$!"); 

} 

if ($arg->{Reuse> | | $arg->{ReuseAddr}) { 
$sock->sockopt (SO_REUSEADDR, 1) or 
return _error($sock, $!, "$!"); 

} 

if ($arg->{ReusePort>) { 

$sock->sockopt (S0_REUSEP0RT, 1) or 
return _error($sock, $!, "$!"); 

} 

if ($arg->{Broadcast>) { 

$sock->sockopt (S0_BR0ADCAST, 1) or 
return _error($sock, $!, "$!"); 

} 

if ($lport | | ($laddr ne INADDR_ANY) | | exists $arg->{Listen}) { 
$sock->bind($lport | | 0, $laddr) or 
return _error($sock, $!, "$!"); 

} 

if (exists $arg->{Listen}) { 

$sock->listen($arg->{Listen} II 5) or 
return _error ($sock, $!, "$!"); 
last ; 

} 

# don't try to connect unless we're given a PeerAddr 
last unless exists ($arg->{PeerAddr}) ; 



$raddr = shift Oraddr; 
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return _error ($sock, $EINVAL, 'Cannot determine remote port 1 ) 
84 unless ($rport || $type == SOCK_DGRAM || $type == SDCK_RAW) ; 

85 

86 last 

s? unless ($type == SDCK_ STREAM || defined $raddr) ; 

88 

89 return _error($sock, $EINVAL, "Bad hostname 1,1 , $arg->{PeerAddr} , ) 

90 unless defined $raddr; 

91 

92 undef $@; 

if ($sock->connect (pack_sockaddr_in($rport , $raddr))) { 

94 return $sock; 

95 } 

96 

97 return _error ($sock, $!, $@ || "Timeout") 

98 unless Oraddr; 

99 } 

100 

$sock; 



102 } 

Here we're dealing with actual network code. The first thing this func- 
tion does is assign all the values needed for the socket, then it calls 
_sock_info() on line 24. This function uses various string parsing util- 
ities to determine whether the protocol, host and port are valid. Here 
we're no longer talking about HTTP, but instead TCP, which is what HTTP 
rides over in the OSI model. The host should be a valid host name or 
IP address, and the port should be a valid port. HTTP runs on port 80, 
HTTPS on 443, and so on. TCP/IP is the basis for any stream connection 
over the Internet. 

Once that's done, line 30 calls _get_proto_number() to get the ac- 
tual number assigned to TCP, something we'll come back to in the next 
section. Similarly, line 37 converts a host name into an IP address if 
need be. The actual socket is created on line 44, which is a call to the 
function of the same name from IO: .'Socket. After that, various flags 
are set, potential errors are handled, and the socket is returned. 



2.6 Sixth layer: Kernel drivers 

So far we've gone through Perl code, but now, our code is talking 
directly to the system. But how is the operating system, whether it's 
Windows, OS X or Unix, actually sending bits over the network? The 
answer is the network driver. At this point, every driver will be differ- 
ent based on the OS you use along with your network card. This is 
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the beauty of APIs. Just like Perl modules give us APIs to their own 
functions, operating systems have system calls to talk to each type of 
hardware. 

We're going to go briefly over what happens at the system level on 
Linux, because the source code is freely available. For an in-depth 
lecture I suggest the Linux Kernel Networking [12] presentation by Rami 
Rosen. The source of the socket code is available in socket. c[13] in the 
Kernel source tree. Here is the code that the Kernel uses to allocate a 
socket to an application: 

static struct socket *sock_alloc(void) 
{ 

struct inode *inode; 
struct socket *sock; 

inode = new_inode_pseudo (sock_mnt->mnt_sb) ; 
if ( ! inode) 

return NULL; 

sock = S0CKET_I (inode) ; 

kmemcheck_annotate_bitf ield(sock, type) ; 
inode->i_ino = get_next_ino () ; 
inode->i_mode = S_IFSDCK | S_IRWXUG0; 
inode->i_uid = current_f suid() ; 
inode->i_gid = current_f sgid() ; 
inode->i_op = &sockf s_inode_ops ; 

this_cpu_add(sockets_in_use , 1); 
return sock; 

} 

On Linux, socket are linked to inodes, which is an index on the file 
system. The Kernel keeps track of these inodes. The this_cpu_add() 
function is simply a way to add the number of sockets to an internal 
list. Finally, connectQ is also defined in that file as a system call: 

SYSCALL_DEFINE3 (connect , int , fd, struct sockaddr user *, uservaddr, 

int , addrlen) 

{ 

struct socket *sock; 

struct sockaddr_storage address; 

int err, fput_needed; 

sock = sockf d_lookup_light (f d, &err, &fput_needed) ; 
if ( ! sock) 

goto out; 

err = move_addr_to_kernel (uservaddr, addrlen, feaddress) ; 
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if (err < 0) 

goto out_put; 

err = 

security_socket_connect (sock, (struct sockaddr *)&address, addrlen 
if (err) 

goto out_put; 

err = sock->ops->connect (sock, (struct sockaddr *)&address, addrlen, 

sock->f ile->f _f lags) ; 

out_put : 

f put_light (sock->f ile , fput_needed) ; 

out : 

return err; 

} 

The socket code is just part of the story however. Once the CPU knows 
how to accept socket calls, it needs to know what to send to the actual 
hardware, and that's done with network drivers. There are hundreds 
of drivers for everything from Ethernet cards, fiber optic connections, 
wireless, and so on. You can view the source of the fairly popular Intel 
PRO/100 Ethernet Card in el00.c[14] in the source tree. 

If you dig into that code, you might realize that abstraction doesn't 
end here. Take a look for example at the elOO_write_flush() function: 

static inline void elOO_write_f lush(struct nic *nic) 
{ 

(void) ioread8 (&nic->csr->scb . status) ; 

} 

Here you can see that the driver calls a function called ioread8() which 
is a Kernel call that is defined in iomap.h which in turn calls readb() 
based on the architecture that Linux runs on, whether it's x86, arm, 
alpha and so on. For example, here is an implementation of readbQ for 
the hexagon platform: 

static inline u8 readb (const volatile void iomem *addr) 

{ 

u8 val; 

asm volatile ( 

"°/oO = membC/l) ; 11 

: "=&r" (val) 

: "r" (addr) 

); 

return val; 

} 
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This is what manually copies each character, in the form of bytes, 
to and from the network hardware. Seeing as this is assembly code, 
we're officially as low on the abstraction stack as we can go. After that, 
it's nothing but assembly commands going back and forth between the 
operating system, the CPU and the various hardware in your machine. 



3 Network traffic 



So far, we've been through five different layers of Perl code and a 
sixth layer of Kernel functions in order to find out what a single line 
did. We went from parsing XML data, to fetching raw data on an HTTP 
connection from a web server, down to the actual network sockets used 
to read and write at the system level. Now, it's time to see what the 
data actually is when looked at directly on the network. 

To do this, I'll be using a packet capture utility to see exactly what 
is written on the socket by all of this code. First, this is the actual 
packet, in bytes (converted from binary to hexadecimal to make it more 
readable), as sent over the wire: 
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Oviously this is fairly pointless to us, so let's use a network utility to 
analyze it: 

Frame: Number = 183, Captured Frame Length = 217, MediaType = ETHERNET 
- Ethernet: Etype = Internet IP (IPv4) ,DestinationAddress : [00-01-96-6A-21-02] , 
SourceAddress : [00-04-23-44-1C-DD] 

- DestinationAddress: 000196 6A2102 [00-01-96-6A-21-02] 
Rsv: (001001..) 

UL: ( 0.) Universally Administered Address 

IG: ( 0) Individual address (unicast) 
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- SourceAddress : 000423 441CDD [00-04-23-44-1C-DD] 
Rsv: (000000..) 

UL: ( 0.) Universally Administered Address 

IG: ( 0) Individual address (unicast) 

EthernetType : Internet IP (IPv4), 2048(0x800) 

- Ipv4: Src = 192.168.0.5, Dest = 192.168.0.1, Next Protocol = TCP, Packet ID 
18472, Total IP Length = 203 

- Versions: IPv4, Internet Protocol; Header Length = 20 

Version: (0100....) IPv4, Internet Protocol 
HeaderLength: (....0101) 20 bytes (0x5) 

- DifferentiatedServicesField: DSCP: 0, ECN: 0 

DSCP: (000000..) Differentiated services codepoint 0 

ECT: ( 0.) ECN-Capable Transport not set 

CE: ( 0) ECN-CE not set 

TotalLength: 203 (OxCB) 
Identification: 18472 (0x4828) 

- FragmentFlags : 16384 (0x4000) 

Reserved: (0 ) 

DF: (.1 ) Do not fragment 

MF: (..0 ) This is the last fragment 

Offset: (...0000000000000) 0 
TimeToLive: 128 (0x80) 
NextProtocol: TCP, 6(0x6) 
Checksum: 13302 (0x33F6) 
SourceAddress: 192.168.0.5 
DestinationAddress: 192.168.0.1 

- Tcp: Flags=. . .AP. . . , SrcPort=64749 , DstPort=HTTP(80) , PayloadLen=163, 
Seq=1284423738 - 1284423901, Ack=163903221 , Win=256 (scale factor 0x8) = 65536 

SrcPort: 64749 
DstPort: HTTP (80) 

SequenceNumber : 1284423738 (0x4C8EC03A) 
AcknowledgementNumber : 163903221 (0x9C4F7Dl) 

- DataOffset: 80 (0x50) 

DataOffset: (0101....) 20 bytes 
Reserved: (....000.) 
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Checksum: OxCCBE, Disregarded 
UrgentPo inter: 0 (0x0) 

TCPPayload: SourcePort = 64749, DestinationPort = 80 
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- Http: Request, GET http://www.reddit.eom/.rss 
Command : GET 
- URI : http://www.reddit.eom/.rss 

Location : http : //www . reddit . com/ . rss 
ProtocolVersion: HTTP/1.1 
TE: deflate,gzip;q=0.3 
Connection: TE, close 
Accept-Encoding: gzip 
Host: www.reddit.com 
UserAgent : XML : : Feed/0 . 52 
HeaderEnd: CRLF 

This tree of information was generated by the Microsoft Network Mon- 
itor, but you can get such information from Wireshark, or any other 
packet capture utility. All of this represents a single packet. Needless 
to say, it would be quite a bit of work to generate something like that 
for each and every packet your application wants to send over the net- 
work, hence all the layers of abstraction we've been through. Let's take 
a look at what's contained here, so we can relate to the functions we've 
seen in the previous sections. 

On line 1, we see that this is an Ethernet frame, so we know the 
driver that handled this request is an Ethernet driver. On line 2, we 
have the start of the Ethernet header. This is entirely filled up by the 
driver itself, including the MAC addresses of the source and destination. 
Then on line 13, we have the header for the IP part of TCP/IP, namely 
the source and destination addresses. In our case, we're dealing with 
IPv4 addresses. You can see there are a lot of flags, most of which 
have default values, and those are assigned by the socket code in Perl 
modules. If you notice line 30, the protocol number for TCP is actually 
6, something we've seen in one of the previous layers. 

Line 34 starts the TCP part of TCP/IP, which defines a stream connec- 
tion. The destination port is 80, and then sequence numbers are shown, 
which is a way the system keeps track of packets. Line 46 starts TCP 
flags which are set by the various modules we've covered, again most 
are set to default values. Finally, we have the HTTP request starting 
at line 57. These lines are much higher in the stack than the previous 
parts. Here we have settings that can actually be set by accessible Perl 
functions. Line 57 has the URL of the HTTP request, and line 66 has the 
user agent, something we've seen as well, in our case set to the name 
of the library, XMLr.Feed, along with the version. 
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4 Conclusion 



In this experiment, we started with a single line of code, and went 
down through the various Perl modules, down to C code for the oper- 
ating system, and down onto the network to see exactly what went on 
from this one command. As you may have noticed, things get compli- 
cated very quickly. It's interesting to note that nothing we've seen is 
a black box, meaning that if you really wanted to, you could recreate 
the actual packet that was shown in the previous section. In Perl, that 
would require you to access the IOr.Socket module directly which isn't 
all that difficult to do, and there are even modules for deeper coding. If 
you're interested in socket coding in Perl I recommend the Perl Socket 
Programming tutorial[15]. 

Hopefully this has been enlightening, or at least entertaining. As you 
can see, abstraction is everywhere in modern day coding. This has a lot 
of advantages, but it's good to sometimes break through those layers 
and explore that lies beneath. 
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